BEEFY: Disarm finality notifications to prevent pinning #5129

skunert · 2024-07-24T16:28:18Z

This should prevent excessive pinning of blocks while we are waiting for the block ancestry to be downloaded after gap sync.
We spawn a new task that gets polled to transform finality notifications into an unpinned counterpart. Before this PR, finality notifications were kept in the notification channel. This led to pinning cache overflows.

fixes #4389

bkchr

Not really super happy, but it should work.

substrate/client/consensus/beefy/src/lib.rs

Co-authored-by: Bastian Köcher <git@kchr.de>

acatangiu · 2024-07-24T17:09:47Z

substrate/client/consensus/beefy/src/lib.rs

+			_ = &mut transformer => {
+				error!(target: LOG_TARGET, "🥩 Finality notification transformer task has unexpectedly terminated.");
+			}


You could also drop this transformer task once BEEFY initialized and worker starts, the worker consumes finality notifications as they come in so no block pinning issues from that point on.
Not really important, but saves us an extra task/future that's running forever and copies data from pinned->unpinned for no functional reason.

That would be an option, true. However I think the impact of this additional future is pretty slim, since we are just using intra-task concurrency here. If we were to go for this route, we would need to re-initialize the transformer in case of a consensus reset where we start the loop again from the beginning. Also we would need to do some kind of "hand-over" for the streams to make absolutely sure that we don't miss any finality notifications. Since this raises the complexity I opted for the most simple solution here.

serban300 · 2024-07-25T08:20:05Z

substrate/client/consensus/beefy/src/lib.rs

+	let finality_notifications = client.finality_notification_stream();
+	let (mut transformer, mut finality_notifications) =
+		finality_notification_transformer_future(finality_notifications);


Would it work if we just did something like:

let mut finality_notifications = client .finality_notification_stream() .map(UnpinnedFinalityNotification::from) .fuse();

?

And in the function definitions we could use:

finality_notifications: &mut Fuse< impl Stream<Item = UnpinnedFinalityNotification<B>> + Unpin, >

Would it work if we just did something like:

let mut finality_notifications = client .finality_notification_stream() .map(UnpinnedFinalityNotification::from) .fuse();

?

This would map the stream, but not solve our problem. The mapping takes place when the next item is retrieved from the stream. Our problem is that next() is not called while we are waiting for the header ancestry to become available. We need an extra future that runs concurrently for this mapping to happen in the background.

Oh, I see, you're right. This makes sense. But, at a first glance, personally I think it would be better if instead we had a wrapper over the finality notifications stream with some internal state. Something like:

FinalityStream { inner: original finality stream, mode: Limited | Full queue: queued notifications while in Limited mode }

We could have it in Limited mode while waiting for the header ancestry to become available, and then switching it to Full. In Limited mode we can just take the notifications, transform them to unpinned and push them to a queue. We still need to pump it while waiting for async_initialize() to finish, just as you do with the transformer here.

Also maybe we could drop all the finality notifications, except for the ones for mandatory headers while in Limited mode ?

Anyway, not a blocker on my side. If the current approach works, I'm ok with merging it, and then I can experiment with this wrapper.

I mean we could do that, but I am not sure if there is any benefit. This transformer is already super simple and gets the job done. I think we can merge the current approach and then iterate on it, for now my priority is to not pin unnecessary blocks. Right now I am warp syncing some nodes to make sure nothing unexpected happens.

Also maybe we could drop all the finality notifications, except for the ones for mandatory headers while in
Limited mode ?

I was experimenting with this and think its worth following up on this idea. However there are some fields that we set based on the finality notifications (even non-mandatory) and since I am just getting familiar with the code, I opted for the simplest-yet-working approach.

Yes we can follow-up with micro-optimizations like removing the extra async task in the middle once initialization is complete. For now, this is good for fixing the issue.

PR just needs prdoc and is good to go from my point of view.

Will add that as soon as my nodes synced correctly. Then we can merge 👍

) This should prevent excessive pinning of blocks while we are waiting for the block ancestry to be downloaded after gap sync. We spawn a new task that gets polled to transform finality notifications into an unpinned counterpart. Before this PR, finality notifications were kept in the notification channel. This led to pinning cache overflows. fixes paritytech#4389 --------- Co-authored-by: Bastian Köcher <git@kchr.de>

While working on #5129 I noticed that after warp sync, nodes would print: ``` 2024-07-29 17:59:23.898 ERROR ⋮beefy: 🥩 Error: ConsensusReset. Restarting voter. ``` After some debugging I found that we enter the following loop: 1. Wait for beefy pallet to be available: Pallet is detected available directly after warp sync since we are at the tip. 2. Wait for headers from tip to beefy genesis to be available: During this time we don't process finality notifications, since we later want to inspect all the headers for authority set changes. 3. Gap sync finishes, route to beefy genesis is available. 4. The worker starts acting, tries to fetch beefy genesis block. It fails, since we are acting on old finality notifications where the state is already pruned. 5. Whole beefy subsystem is being restarted, loading the state from db again and iterating a lot of headers. This already happened before #5129.

While working on paritytech#5129 I noticed that after warp sync, nodes would print: ``` 2024-07-29 17:59:23.898 ERROR ⋮beefy: 🥩 Error: ConsensusReset. Restarting voter. ``` After some debugging I found that we enter the following loop: 1. Wait for beefy pallet to be available: Pallet is detected available directly after warp sync since we are at the tip. 2. Wait for headers from tip to beefy genesis to be available: During this time we don't process finality notifications, since we later want to inspect all the headers for authority set changes. 3. Gap sync finishes, route to beefy genesis is available. 4. The worker starts acting, tries to fetch beefy genesis block. It fails, since we are acting on old finality notifications where the state is already pruned. 5. Whole beefy subsystem is being restarted, loading the state from db again and iterating a lot of headers. This already happened before paritytech#5129.

skunert added 2 commits July 24, 2024 18:07

Prevent pinning in beefy finality notifications

252991d

Remove pub visibility

99c0a93

skunert added the T0-node This PR/Issue is related to the topic “node”. label Jul 24, 2024

skunert requested review from serban300, acatangiu and a team July 24, 2024 16:28

bkchr approved these changes Jul 24, 2024

View reviewed changes

substrate/client/consensus/beefy/src/lib.rs Outdated Show resolved Hide resolved

Update lib.rs

2df0ee0

Co-authored-by: Bastian Köcher <git@kchr.de>

acatangiu approved these changes Jul 24, 2024

View reviewed changes

lexnv approved these changes Jul 25, 2024

View reviewed changes

serban300 reviewed Jul 25, 2024

View reviewed changes

serban300 approved these changes Jul 25, 2024

View reviewed changes

skunert added 4 commits July 26, 2024 12:19

prdoc

d84a0ff

Merge branch 'master' into skunert/beefy-simplified

424aa4a

Fix prdoc

1b830ae

Merge branch 'master' into skunert/beefy-simplified

218c3ed

skunert enabled auto-merge July 26, 2024 14:09

skunert added this pull request to the merge queue Jul 26, 2024

Merged via the queue into paritytech:master with commit 5dc0670 Jul 26, 2024
159 of 161 checks passed

skunert deleted the skunert/beefy-simplified branch July 26, 2024 14:57

skunert mentioned this pull request Jul 31, 2024

beefy: Tolerate pruned state on runtime API call #5197

Merged

ggwpez mentioned this pull request Sep 9, 2024

Parent issue for stable2409 LTS release #5583

Closed

github-actions bot mentioned this pull request Oct 8, 2024

Update polkadot-sdk from stable2407 to stable2409 moonbeam-foundation/moonbeam#2994

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEEFY: Disarm finality notifications to prevent pinning #5129

BEEFY: Disarm finality notifications to prevent pinning #5129

skunert commented Jul 24, 2024

bkchr left a comment

acatangiu Jul 24, 2024

skunert Jul 25, 2024

serban300 Jul 25, 2024 •

edited

Loading

skunert Jul 25, 2024

serban300 Jul 25, 2024

skunert Jul 25, 2024

acatangiu Jul 25, 2024

skunert Jul 25, 2024

BEEFY: Disarm finality notifications to prevent pinning #5129

BEEFY: Disarm finality notifications to prevent pinning #5129

Conversation

skunert commented Jul 24, 2024

bkchr left a comment

Choose a reason for hiding this comment

acatangiu Jul 24, 2024

Choose a reason for hiding this comment

skunert Jul 25, 2024

Choose a reason for hiding this comment

serban300 Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

skunert Jul 25, 2024

Choose a reason for hiding this comment

serban300 Jul 25, 2024

Choose a reason for hiding this comment

skunert Jul 25, 2024

Choose a reason for hiding this comment

acatangiu Jul 25, 2024

Choose a reason for hiding this comment

skunert Jul 25, 2024

Choose a reason for hiding this comment

serban300 Jul 25, 2024 •

edited

Loading